# Linear Classification in the Streaming Model: Improved Bounds for Heavy Hitters

This code supplements our NeurIPS 2021 submission. It is largely based on the repository at https://github.com/stanford-futuredata/wmsketch which supplements the SIGMOD 2018 paper ``Sketching Linear Classifiers over Data Streams'' by Tai, et al. This code can be used to empirically compare the WM-Sketch algorithm of Tai, et al. with our algorithms for l2 point query: (i) the black box reduction to standard l2 point query (implemented in black_box_reduction.h and black_box_reduction.cpp) and (ii) our algorithm which uses a JL matrix as a point query data structure (implemented in jl_recovery_sketch.h and jl_recovery_sketch.cpp). Our implementations for these algorithms are modifications of the implementation of WM-Sketch by Tai, et al. We make use of their implementation of WM-Sketch as our baseline (WM-Sketch is in the files logistic_sketch.h and logistic_sketch.cpp).

# Building the Project

To compile, run the following from the current directory:

```
cd build
make -f run_experiments build
```

The file wmsketch_classification, produced by the above, is the executable used to produce our empirical results. 

To delete all files produced during compilation, run the following from the current directory:

```
cd build
make -f run_experiments remove_build
```

# Downloading the Datasets

To download the three datasets we use, run the following from the current directory:

```
cd Datasets
make -f download_datasets download_data
```

To delete these datasets, run the following from the current directory:

```
cd Datasets
make -f download_datasets delete_datasets
```

# Running the Experiments

To run the algorithms, first compile the project and download the datasets as described above. Then, from the "build" directory, do the following. To run our l2 point query algorithm which uses the black-box reduction to standard l2 point query, run

```
make -f run_experiments run_experiments_black_box_reduction trial_num=TN
```

where TN can be any number desired. This creates a folder JSON_Results_black_box_reduction_Trial_TN, containing JSON files with the resulting estimates of the top weights, using Algorithm 2 of our paper, for point query. In this folder, there will be one JSON file for each setting of the parameters (i.e. sketch dimensions, l2 regularization).

To run the algorithm of Tai, et al. 2018, run

```
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=TN
```

where TN can again be any desired number. To run our l2 point query algorithm which uses a JL matrix as a point query data structure, run

```
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=TN
```

where TN can be any desired number. Finally, to run online logistic regression without sketching, run 

```
make -f run_experiments run_logistic_regression
```

# Plotting the Results

In order to obtain the plots shown in our paper, run our black-box reduction-based algorithm, our algorithm which uses a JL matrix as a point query data structure, and the algorithm of Tai, et al. as described in the previous step. In addition, run online logistic regression without sketching as described above. Specifically, run the following commands in the "build" directory:

```
make -f run_experiments run_experiments_black_box_reduction trial_num=1
make -f run_experiments run_experiments_black_box_reduction trial_num=2
make -f run_experiments run_experiments_black_box_reduction trial_num=3
make -f run_experiments run_experiments_black_box_reduction trial_num=4
make -f run_experiments run_experiments_black_box_reduction trial_num=5

make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=1
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=2
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=3
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=4
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=logistic_sketch trial_num=5

make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=1
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=2
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=3
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=4
make -f run_experiments run_experiments_wm_and_jl_sketch algorithm=jl_recovery_sketch trial_num=5

make -f run_experiments run_logistic_regression
```

Once the folders produced by each of these commands are in the "build" directory, run plotting.py. This will create the desired plots in the "build" directory.